Skip to content

Conversation

@bovlb
Copy link
Contributor

@bovlb bovlb commented Aug 4, 2025

The key feature added by this PR is to support queries of the form:

SELECT * FROM "crawl-to-rag"
WHERE _find_similar = FIND_SIMILAR(text:='find entity', k:= 10) AND _blobs;

Here we are able to take a text string, embed it using the same model as was used to create a descriptor set, and pass it to the FindDescriptor call.

This PR also includes a re-factoring of the way we control the behaviour of the query execution. The control is now rooted in the configuration supplied in import_schema and the execute method is now a much simpler beast, that uses callback functions. This was a little tricky to do as the configuration has to be communicated via text strings to a different invocation environment, which made it hard to pass function pointers.

I have also added a special boolean parameter _blobs for any table that can return blobs, corresponding to the blobs parameter. This means that blobs are no longer returned just because the user says SELECT *. Note that making _blobs work required me to add some features to the underlying multicorn2 library. See pgsql-io/multicorn2#78

image image

I also added a rudimentary EXPLAIN feature that shows the AQL in the SQL interface.

image

@bovlb bovlb requested a review from Copilot August 4, 2025 04:45

This comment was marked as outdated.

@bovlb bovlb requested a review from Copilot August 4, 2025 05:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for "find similar" search functionality, allowing text and vector-based similarity queries in ApertureDB through SQL. The implementation enables queries like SELECT * FROM "crawl-to-rag" WHERE _find_similar = FIND_SIMILAR(text:='find entity', k:= 10) AND _blobs.

Key changes include:

  • Implemented find similar search with text embedding support
  • Refactored query execution to use callback-based configuration instead of direct parameter passing
  • Added special _blobs parameter to control blob return behavior instead of automatic inclusion in SELECT *

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
base/docker/scripts/embeddings/embeddings.py Added check_properties method to validate embedder properties
apps/sql-server/fdw/fdw/table.py New module defining TableOptions with callback support for command modification
apps/sql-server/fdw/fdw/system.py Refactored system table creation with new callback architecture
apps/sql-server/fdw/fdw/entity.py Updated entity table creation to use new TableOptions structure
apps/sql-server/fdw/fdw/descriptor.py Added find similar functionality with embedding support and vector operations
apps/sql-server/fdw/fdw/connection.py Updated connection table creation for new architecture
apps/sql-server/fdw/fdw/common.py Introduced Curry class for serializable callbacks and removed old table/column options
apps/sql-server/fdw/fdw/column.py New module with ColumnOptions and blob handling utilities
apps/sql-server/fdw/fdw/init.py Major refactor of FDW execution logic with callback-based query building
apps/sql-server/app/sql/functions.sql Added FIND_SIMILAR SQL function for similarity queries
apps/sql-server/Dockerfile Updated to install embeddings dependencies and use custom multicorn2 branch
apps/rag/Dockerfile Added embeddings dependencies installation
apps/crawl-to-rag/Dockerfile Added embeddings dependencies installation
Comments suppressed due to low confidence (1)

Copy link
Contributor

@drewaogle drewaogle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@bovlb bovlb merged commit 1a70dd0 into main Aug 5, 2025
18 checks passed
@bovlb bovlb deleted the sql-server5 branch August 5, 2025 00:29
@gsaluja9
Copy link
Collaborator

gsaluja9 commented Aug 5, 2025

Docker images for version v2025.8.5 were built and pushed after this PR was merged. View workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants